Goto

Collaborating Authors

 Virginia


Greedy Sampling Is Provably Efficient for RLHF

Neural Information Processing Systems

Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique for post-training large language models. Despite its empirical success, the theoretical understanding of RLHF is still limited, as learning the KL-regularized target with only preference feedback poses additional challenges compared with canonical RL. Existing works mostly study the reward-based Bradley-Terry (BT) preference model, and extend classical designs utilizing optimism or pessimism. This work, instead, considers the general preference model (whose practical relevance has been observed recently) and obtains performance guarantees with major, order-wise improvements over existing ones. Surprisingly, these results are derived from algorithms that directly use the empirical estimates (i.e., greedy sampling), as opposed to constructing optimistic or pessimistic estimates in previous works. This insight has a deep root in the unique structural property of the optimal policy class under the KL-regularized target, and we further specialize it to the BT model, highlighting the surprising sufficiency of greedy sampling in RLHF.


Dynamic Diffusion Schrรถdinger Bridge in Astrophysical Observational Inversions

Neural Information Processing Systems

We study Diffusion Schrรถdinger Bridge (DSB) models in the context of dynamical astrophysical systems, specifically tackling observational inverse prediction tasks within Giant Molecular Clouds (GMCs) for star formation. We introduce the AstroDSB model, a variant of DSB with the pairwise domain assumption tailored for astrophysical dynamics.


Generating and Checking DNNVerification Proofs

Neural Information Processing Systems

Deep Neural Networks (DNN) have emerged as an effective approach to implementing challenging subproblems. They are increasingly being used as components in critical transportation, medical, and military systems. However, like human-written software, DNNs may have flaws that can lead to unsafe system performance. To confidently deploy DNNs in such systems, strong evidence is needed that they do not contain such flaws. This has led researchers to explore the adaptation and customization of software verification approaches to the problem of neural network verification (NNV). Many dozens of NNV tools have been developed in recent years and as a field these techniques have matured to the point where realistic networks can be analyzed to detect flaws and to prove conformance with specifications. NNV tools are highly-engineered and complex may harbor flaws that cause them to produce unsound results. We identify commonalities in algorithmic approaches taken by NNV tools to define a verifier independent proof format--activation pattern tree proofs (APTP)--and design an algorithm for checking those proofs that is proven correct and optimized to enable scalable checking. We demonstrate that existing verifiers can efficiently generate APTP proofs, and that an APTPcheckersignificantly outperforms prior work on a benchmark of 16 neural networks and 400 NNV problems, and that it is robust to variation in APTP proof structure arising from different NNV tools.


Compositional Neural Network Verification via Assume-Guarantee Reasoning

Neural Information Processing Systems

Verifying the behavior of neural networks is necessary if developers are to confidently deploy them as parts of mission-critical systems. Toward this end, researchers have been actively developing a range of increasingly sophisticated and scalable neural network verifiers. However, scaling verification to large networks is challenging, at least in part due to the significant memory requirements of verification algorithms. In this paper, we propose an assume-guarantee compositional framework, CoVeNN, that is parameterized by an underlying verifier to generate a sequence of verification sub-problems to address this challenge. We present an iterative refinement-based strategy for computing assumptions that allow sub-problems to retain sufficient accuracy. An evaluation using 7 neural networks and a total of 140 property specifications demonstrates that CoVeNN can verify nearly 7 times more problems than state-of-the-art verifiers.


Fair Matroid Selection

Neural Information Processing Systems

We investigate the problem of sequentially selecting elements of an unknown matroid in an online manner to form an independent set, with the goal of maximizing the minimum probability of acceptance across all elements, a property we define as f-fairness. Under adversarial arrival orders, we design an ฮฑ(lnk + 1)-fair algorithm, where ฮฑ is the arboricity of the matroid and k is the rank, a result that is nearly optimal. For laminar matroids, we develop a (2ฮฑ 1)-fair algorithm, which is optimal up to constant factors, achieved through a novel online coloring scheme. In the random arrival order setting, we achieve a (4+o(1))ฮฑ-fair algorithm for graphic matroids, matching the optimal result up to constant factors, relying on a novel technique for learning a degeneracy ordering using a sampled subset of edges. We further generalize our result to p-matchoids, obtaining a ฮฒ(plnk + 1)-fair algorithm for the adversarial arrival model, where ฮฒ is the optimal offline fairness. Notably, all our results can be extended to a setting with no prior knowledge of the matroid with only a logarithmic increase in the fairness factor.


Accelerating Chain of Thought Reasoning through Semantically Aligned Implicit Tokens

Neural Information Processing Systems

Chain-of-Thought (CoT) enhances the performance of Large Language Models (LLMs) on reasoning tasks by encouraging step-by-step solutions. However, the verbosity of CoT reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning steps within LLM's hidden embeddings (termed "implicit reasoning") rather than explicit tokens. This approach accelerates CoT reasoning by reducing the reasoning length and bypassing some LLM components. However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token.


Towards Provable Emergence of In-Context Reinforcement Learning

Neural Information Processing Systems

Typically, a modern reinforcement learning (RL) agent solves a task by updating its neural network parameters to adapt its policy to the task. Recently, it has been observed that some RL agents can solve a wide range of new out-of-distribution tasks without parameter updates after pretraining on some task distribution. When evaluated in a new task, instead of making parameter updates, the pretrained agent conditions its policy on additional input called the context, e.g., the agent's interaction history in the new task. The agent's performance increases as the information in the context increases, with the agent's parameters fixed. This phenomenon is typically called in-context RL (ICRL). The pretrained parameters of the agent network enable the remarkable ICRL phenomenon.


Crypto Guys Bought the Answer to the CIA's Mysterious Kryptos Sculpture

WIRED

They swear they haven't peeked at the closely guarded secret and that they'll keep the cryptographic competition going. On a blustery March day, the artist Jim Sanborn received visitors at his studio on an isolated island in the Chesapeake Bay. The visitors sat him down in front of a laptop, and he typed in a secret message. They compressed the message using a unique hash function, sent that to the cloud, and wiped the laptop clean. Sanborn hoped that this action would set him free.


Interview with AAAI Fellow Sanmay Das: multiagent systems

AIHub

Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. We're talking to some of the 2026 AAAI Fellows to find out more about their work. In this interview, we chat to Sanmay Das, who was elected as a Fellow . Could you start with a quick introduction, where you work, and your general area of research? Broadly speaking, I work in multiagent systems. I've done a lot of work at the intersection of AI and economics, and over the last decade or so I've thought a lot about projects in the AI for social impact and social good space. In particular, my interest has been in the allocation of scarce societal resources, thinking about how AI can be integrated, and what it tells us about systems where we don't necessarily want full free market resource allocation.


Start-ups are racing to revolutionise mathematics with AI

New Scientist

Mathematicians have never been so sought after by the world's richest people. At universities across the world, academics are seeing their colleagues mysteriously disappear and join private companies. Some of these companies are household names, like OpenAI and Google, but others are newly formed and just months old, hoping to capitalise on a moment in which mathematics is seen as the secret ingredient with which to improve artificial intelligence - which may in turn transform mathematics itself. "Last May, I was honestly kind of grieving for my scientific identity," says Ken Ono, who in 2025 went on leave from a professorship at the University of Virginia to join Axiom Math, a start-up aiming to build a maths-focused AI. Ono had been asked by a different company, called Epoch AI, to help craft a set of hard-to-solve maths problems that would test AI's problem-solving ability .